{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 11.3 Hash functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 11.3-1\n",
    "\n",
    "> Suppose we wish to search a linked list of length $n$, where each element contains a key $k$ along with a hash value $h(k)$. Each key is a long character string. How might we take advantage of the hash values when searching the list for an element with a given key?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compare the long character strings only when they have the same hash values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 11.3-2\n",
    "\n",
    "> Suppose that we hash a string of $r$ characters into $m$ slots by treating it as a radix-128 number and then using the division method. We can easily represent the number $m$ as a 32-bit computer word, but the string of $r$ characters, treated as a radix-128 number, takes many words. How can we apply the division method to compute the hash value of the character string without using more than a constant number of words of storage outside the string itself?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We should calculate\n",
    "\n",
    "$$\\sum_{i=0}^{r-1} c_i \\cdot 128^i \\mod m$$\n",
    "\n",
    "It cannot be calculated with a constant number of words of storage because the sum may exceed 2^32 - 1. However, Equation 31.18 suggests\n",
    "\n",
    "$$\n",
    "\\begin{align*}\n",
    "\\sum_{i=0}^{r-1} c_i \\cdot 128^i\n",
    " &\\equiv \\sum_{i=0}^{r-1} (c_i \\cdot 128^i) \\bmod m \\pmod m \\\\\n",
    " &\\equiv \\sum_{i=0}^{r-1} (c_i \\cdot 128^i \\bmod m) \\pmod m \\\\\n",
    " &\\equiv \\sum_{i=1}^{r-1} (c_i \\cdot 128^i \\bmod m) + c_1 \\cdot 128 \\bmod m+ c_0 \\bmod m \\pmod m \\\\\n",
    " &\\equiv \\sum_{i=1}^{r-1} ((c_i \\cdot 128^{i-1} \\bmod m) + c_1 \\bmod m) \\bmod m \\cdot (128 \\bmod m) \\bmod m + c_0 \\bmod m \\pmod m \\\\\n",
    " &\\equiv ... \\\\\n",
    " &\\equiv (...(c_{r-1} \\bmod m \\cdot (128 \\bmod m) \\bmod m + c_{r-2} \\bmod m) \\bmod m \\cdot ... \\cdot (128 \\bmod m) + c_1 \\bmod m) \\bmod m + c_0 \\bmod m \\pmod m\n",
    "\\end{align*}\n",
    "$$\n",
    "\n",
    "It can be calculated with a loop.\n",
    "\n",
    "```\n",
    "sum := 0\n",
    "for i = 1 to r\n",
    "    sum := ((sum % m) * (128 % m) % m + s[i] % m) % m\n",
    "```\n",
    "\n",
    "And it fits in a word now. Futhermore, we may apply Equation 31.18 again and get\n",
    "\n",
    "```\n",
    "sum := 0\n",
    "for i = 1 to r\n",
    "    sum := (sum * 128 + s[i]) % m\n",
    "```\n",
    "\n",
    "Use `sum` as the key."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 11.3-3\n",
    "\n",
    "> Consider a version of the division method in which $h(k) = k~\\text{mod}~m$, where $m = 2^p - 1$ and $k$ is a character string interpreted in radix $2^p$. Show that if we can derive string $x$ from string $y$ by permuting its characters, then $x$ and $y$ hash to the same value. Give an example of an application in which this property would be undesirable in a hash function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$2^p ~\\text{mod}~ (2^p-1)=1$$\n",
    "\n",
    "$$c \\cdot (2^p)^x ~\\text{mod}~ (2^p-1)= c \\cdot 1^x = c$$\n",
    "\n",
    "Thus the hashing is equivalent to $(\\sum c_i)~\\text{mod}~m$, the strings with different permutations will have the same hashing value."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 11.3-4\n",
    "\n",
    "> Consider a hash table of size $m=1000$ and a corresponding hash function $h(k) = \\lfloor m (kA ~\\text{mod}~ 1) \\rfloor$ for $A=(\\sqrt{5}-1)/2$. Compute the locations to which the keys 61, 62, 63, 64, and 65 are mapped."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$h(61)=700$\n",
    "\n",
    "$h(62)=318$\n",
    "\n",
    "$h(63)=936$\n",
    "\n",
    "$h(64)=554$\n",
    "\n",
    "$h(65)=172$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 11.3-5 $\\star$\n",
    "\n",
    "> Define a family $\\mathcal{H}$ of hash functions from a finite set $U$ to a finite set $B$ to be __*$\\epsilon$-universal*__ if for all pairs of distinct elements $k$ and $l$ in $U$,\n",
    "\n",
    "> $\\text{Pr}\\{h(k)=h(l)\\} \\le \\epsilon$,\n",
    "\n",
    "> where the probability is over the choice of the hash function $h$ drawn at random from the family $\\mathcal{H}$. Show that an $\\epsilon$-universal family of hash functions must have\n",
    "\n",
    "> $\\displaystyle \\epsilon \\ge \\frac{1}{|B|} - \\frac{1}{|U|}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose $n_i$ is the number of elements in slot $i$, then the total number of collisions is:\n",
    "\n",
    "Suppose $|U|=n$ and $|B|=m$\n",
    "\n",
    "$$\n",
    "\\begin{array}{rll}\n",
    "\\displaystyle \\sum_{i=1}^{m} \\frac{n_i (n_i - 1)}{2}\n",
    "&=& \\displaystyle \\frac{1}{2}\\sum_{i=1}^{m} n_i^2 - \\frac{1}{2}\\sum_{i=1}^{m} n_i \\\\\n",
    "&\\ge& \\displaystyle \\frac{1}{2}\\sum_{i=1}^{m} \\left (\\frac{n}{m} \\right )^2 - \\frac{n}{2} \\\\\n",
    "&=& \\displaystyle \\frac{n^2}{2{m}} - \\frac{n}{2} \\\\\n",
    "\\end{array}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\begin{array}{rll}\n",
    "\\displaystyle \\epsilon &\\ge& \\displaystyle \\frac{\\displaystyle \\sum_{i=1}^{m} \\frac{n_i (n_i - 1)}{2}}{\\displaystyle \\frac{n(n-1)}{2}} \\\\\n",
    "&=& \\displaystyle \\frac{\\displaystyle \\frac{n^2}{m} - n}{n(n-1)} \\\\\n",
    "&=& \\displaystyle \\frac{n - m}{m(n - 1)} \\\\\n",
    "&=& \\displaystyle \\frac{n}{m(n - 1)} - \\frac{1}{n - 1} \\\\\n",
    "&\\ge& \\displaystyle \\frac{n}{mn} - \\frac{1}{n} \\\\\n",
    "&=& \\displaystyle \\frac{1}{m} - \\frac{1}{n} \\\\\n",
    "\\end{array}\n",
    "$$\n",
    "\n",
    "Therefore $\\displaystyle \\epsilon \\ge \\frac{1}{|B|} - \\frac{1}{|U|}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 11.3-6 $\\star$\n",
    "\n",
    "> Let $U$ be the set of $n$-tuples of values drawn from $\\mathbb{Z}_p$, and let $B=\\mathbb{Z}_p$, where $p$ is prime. Define the hash function $h_b$: $U \\rightarrow B$ for $b \\in \\mathbb{Z}_p$ on an input $n$-tuple $\\langle a_0, a_1, \\dots, a_{n-1} \\rangle$ from $U$ as\n",
    "\n",
    "> $\\displaystyle h_b(\\langle a_0, a_1, \\dots, a_{n-1} \\rangle)=\\left ( \\sum_{j=0}^{n-1}a_jb^j \\right ) ~\\text{mod}~p$,\n",
    "\n",
    "> and let $\\mathcal{H}=\\{ h_b : b \\in \\mathbb{Z}_p \\}$. Argue that $\\mathcal{H}$ is $((n-1)/p)$-universal according to the definition of $\\epsilon$-universal in Exercise 11.3-5."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$\\dots$"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
